Skills: Displaying map data
Week 1
This page introduces some of the skill that are useful for displaying geospatial data on a map using R, Python, QGIS, or ARcGIS.
The R skills on this page use the following R packages:
library(tigris)
library(rnaturalearth)
library(sf)
library(here)
library(tidyverse)
library(leaflet)
library(ggspatial)
library(ggthemes)In order to run Python code from RStudio, I’m using the
reticulate package and setting up my RStudio environment so
it knows where I have Python installed. Just because you can
run Python in RStudio, doesn’t mean you should. You might be better
served by PyCharm or Jupyter notebooks.
library(reticulate)
Sys.setenv(RETICULATE_PYTHON = here("env_2128/Scripts/python.exe"))
use_virtualenv(here("env_2128"))The Python skills on this page use the following Python packages.
import geopandas
import pygris
import pandas
import matplotlib.pyplot as plt
from pyprojroot.here import here
from geopandas.tools import sjoinLoading data
A first step in working with geospatial data is loading it into the software you are working with.
R
Some packages (such as the tigris package) will load
data directly from an online source (so you need to be connected to the
internet for these functions to work).
Other packages (such as rnaturalearth) include
geospatial datasets (the data was installed on your computer along with
the package, so they will work regardless of whether you are connected
to the internet).
The st_read funtion (from the sf package)
will read data from a file in any common data format.
For shapefiles, use the name of the directory containing the set of files.
Python
pygris is the Python version of tigris and
can download TIGER files via the census API. It is not as complete as
tigris.
You can load spatial data from a file using the
read_file function in the geopandas package.
To read shapefiles, reference the directory containing the set of
files.
QGIS
In QGIS, you can load a layer by dragging and dropping it into the main window. This can be either a single geospatial file (like a geojson file or a kml file) or a directory containing a shapefile (along with its associated files).
Once the dataset is loaded in QGIS, you’ll see it listed in your layers panel and displayed in the map panel.
Gif 1: Loading datasets in QGIS
ArcGIS
In ArcGIS Pro, click on the “Add Data” button on the Map tab, navigate to the shapefile you want to load, and select OK.
Once the dataset is loaded in ArcGIS, you’ll see it listed in your contents panel and displayed in the map panel.
Gif 2: Loading a shapefile in ArcGIS Pro
Filtering data by attribute
You may have loaded a set of features that includes some categories you aren’t interested in.
R
You can use the filter function to filter based on the
value of a variable.
QGIS
In QGIS, you’ll do this in two steps. First, you’ll select features based on an attribute value.
Gif 3: Selecting by attribute in QGIS
Then, you’ll export your selection as a new layer.
Gif 4: Exporting selected features in QGIS
Filtering data by location
You may have loaded data for a large area, and you’re only interested in the features within a smaller area.
R
You can use st_filter to keep only the features within a
polygon.
Python
In Python, you can do this in two steps:
- A spatial join (
sjoin) will add columns from the polygon containing each feature. Features that aren’t in a polygon will have NA (missing) values. dropnawill filter out the rows with an NA value for a column that comes from the polygon you used in the spatial join.
Viewing data in a map
It can be helpful to take an initial look at your data on a map. ArcGIS and QGIS will display your data in a map panel when you load it. In R or Python, you’ll need to create some kind of plot to visualize your data.
R (interactive)
The leaflet package in R can create interactive,
html-based maps. These can be useful for exploring your data even if you
ultimately want to be producing a static map. You can read about
leaflet here: https://rstudio.github.io/leaflet/
Set up the leaflet map with a call to the leaflet()
function. Use the addTiles() function if you want to
display your data over a default OpenStreetMap base map. Then use
addPolygons() to add polygon data,
addPolylines() to add line data, or
addCircleMarkers() to add point data.
A useful trick if you just want to explore your data is to specify a popup that will give you the value of a variable when you click on a feature.
R (static)
You can also make a quick static plot of a dataset using
ggplot.
Python (interactive)
The explore function in geopandas is useful
for creating a quick interactive map. By default, it will display the
layer over OpenStreetMap tiles and add popups displaying values for all
variables.
In the code below, I’m creating the interactive map and saving it as
an html file. I can embed that map within my knitted RMarkdown document
by adding the line
{width=100% height=500px}
below the chunk (not within any code chunk).
In a Jupyter notebook, you wouldn’t need to save and embed the map. It would just render below your code chunk.
Styling your layers
You can control the appearance of features on your map by defining some of the following attributes
- Polygons
- Fill color
- Fill opacity
- Fill pattern
- Line color
- Line opacity
- Line weight
- Lines
- Color
- Opacity
- Weight
- Points
- Size
- Shape
- Color
- Opacity
R (interactive)
In leaflet, addPolygons and
addCircleMarkers have the following optional arguments:
strokecontrols whether there will be a line border around the polygon. The default isTRUE.fillcontrols whether the polygon will be filled. The default isTRUE. If you set bothstrokeandfillto false, the polygon will be invisible.colorcontrols the color of the border.opacitycontrols the transparency/opacity of the border. It ranges from 0 to 1, where 1 is fully opaque and 0 is fully transparent.weightcontrols the line weight of the border.fillColorcontrols the fill color.fillOpacitycontrols the fill opacity. It ranges from 0 to 1, where 1 is fully opaque and 0 is fully transparent.
In addition, addCircleMarkers has an argument for
radius that controls the size of the circles.
addPolylines has optional arguments for
color, opacity, and weight.
leaflet() |>
addPolygons(data = texas,
popup = ~NAME,
stroke = FALSE,
fillOpacity = 1,
fillColor = "white") |>
addPolylines(data = TX_rails,
popup = ~FULLNAME,
color = "gray",
weight = 1,
opacity = 1) |>
addCircleMarkers(data = TX_cemeteries,
popup = ~FULLNAME,
color = "black",
radius = 0.5,
opacity = 1) R (static)
In ggplot, you can set the following optional arguments
in the geom_sf function:
colorsets the color of a point or a line, or the outline color of a polygon (setcolor = NAfor a polygon with no outline).fillsets the fill color of a polygon (setfill = NAfor a polygon with no fill).alphasets the transparency of a feature. It ranges from 0 to 1, where 1 is fully opaque and 0 is fully transparent.linewidthsets the weight of a line (or the outline of a polygon).linetypesets the style of a line. Options include:solidlongdashdotteddotdashdashed
sizesets the size of a pointshapesets the shape of a point. The most common options are indicated by numbers 0 through 25. Common options include- 20 for a small, filled circle
- 15 for a square
- 17 for a triangle
- 18 for a diamond
ggplot() +
geom_sf(data = texas,
color = NA,
fill = "cornsilk") +
geom_sf(data = TX_rails,
color = "black",
linewidth = 0.5,
linetype = "dotted") +
geom_sf(data = TX_cemeteries,
size = 4,
shape = 18,
color = "darkgreen",
alpha = 0.5)Styling your plot
You can also change the appearance of the plot itself.
R (static)
In ggplot, you’ll set the overall appearance of a plot
using the theme function. In the example below, I’m
removing the tick marks and axis text, setting the panel background to
be light blue, and setting the grid lines (which are latitude and
longitude lines in this case) to be dotted, darker blue lines.
ggplot(africa) +
geom_sf() +
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill = "lightblue"),
panel.grid.major = element_line(color = "blue",
linetype = "dotted"))Rather than messing around with all the details you can set in the
theme function, you can also use one of many pre-defined
themes. Most of these were developed for use with scatter plots rather
than maps.
For maps, you often don’t want to show grid lines or axis text.
theme_void will remove all of this from your plot.
The ggthemes package has a number of interesting pre-defined themes. Again, most of them were developed for scatter plots. Here is a plot in the Wall Street Journal house style.
ggthemes also has a function called
theme_map that gives you a nice, clean-looking map. You’ll
notice that it looks a lot like theme_void. The main
difference is in how the legend appears (not relevant in this example
that doesn’t have a legend).
Adding a legend
R (static)
To include an item in a legend, you specify its appearance a little
differently. In the example below, I replace
linetype = "dotted" with
aes(linetype = "Railroad") so that the item will appear in
a legend labeled as “Railroad.” Then I add the function
scale_linetype_manual to create a legend based on line
type, and I specify the line type for railroads within that
function.
I do something similar to create a color-based legend for cemeteries.
Note that this is actually creating two seperate legends (one for
line types and one for colors) so I’m using the
legend.title argument in the theme function to
remove all the legend titles.
ggplot() +
geom_sf(data = texas,
color = NA,
fill = "cornsilk") +
geom_sf(data = TX_rails,
color = "black",
linewidth = 0.5,
aes(linetype = "Railroad")) +
geom_sf(data = TX_cemeteries,
size = 4,
shape = 18,
aes(color = "Cemetery"),
alpha = 0.5) +
scale_color_manual(values = "darkgreen") +
scale_linetype_manual(values = "dotted") +
theme_map() +
theme(legend.title = element_blank())Adding a north arrow
A north arrow is not always necessary, but it can be helpful depending on your audience and on the purpose of your map.
R (static)
The ggspatial package includes an
annotation_north_arrow function that will add a north arrow
to your map. It includes the following arguments:
locationindicates where the north arrow will be displayed. Options includetl(top left),tr(top right),bl(bottom left), andbr(bottom right)styleindicates the arrow style. Options include:- north_arrow_orienteering
- north_arrow_fancy_orienteering
- north_arrow_minimal
- north_arrow_nautical
ggplot(africa) +
geom_sf() +
annotation_north_arrow(location = "tr",
style = north_arrow_minimal) +
theme_map()Adding a scale bar
R (static)
The ggspatial package includes an
annotation_scale function that will add a scale bar to your
map. It includes the following arguments:
locationindicates where the scale bar will be displayed. Options includetl(top left),tr(top right),bl(bottom left), andbr(bottom right)styleindicates the scale bar style: either"bars"or"ticks".
Adding annotation
The easiest way to annotate a static map may be to bring it into illustrator, but you can also do it within your GIS program.
R (static)
The annotate function will place text on your map. You
need to specify the following arguments:
geom: Setgeom = "text"to place text on the map.x,y: The x- and y-coordinates of the location the text should be placed, in the coordinate system of the map (if you’re plotting the map without grid lines, you might want to make a quick plot that shows them so you can find the right values to place your annotation).label: The text you want to place. There is no text wrapping; you can use\nwithin your next string to move to a new line.
The following optional arguments are also helpful:
hjust: The horizontal justification of your text. The default is 0.5, which indicates centered text (and x-coordinate you’ve used to place the text will be for the middle of the text). A value of 0 will give you left-justified text, and a value of 1 will give you right-justified text.vjust: The vertical justification of your text. A value of 0.5 means the y-coordinate you’ve specified is for the center of the text. A value of 0 means the y-coordinate is for the bottom of the text. A value of 1 means the y-coordiate is for the top of the text.
ggplot(open_space) +
geom_sf() +
annotate(geom = "text",
x = -71.1,
y = 42.25,
label = "This is a map of\nopen space in Boston",
hjust = 0,
vjust = 1) Saving your map
Please don’t just take screenshots of the maps you create in a GIS program. You can save them to a variety of file types.
R (static)
The ggsave function give you a nice easy way to save a
ggplot map. By default, it saves your most recent plot. You
should specify the following arguments:
filename: The name of the file.ggsavewill guess the file format (e.g. jpg, png, pdf) based on the filename’s extension. You can pipe the path from thehere()function to make sure you end up in the right directory.width,height: The dimensions of the image you want to save.units: The units of the dimensions you’ve specified, one of:"in"for inches"cm"for centimeters"mm"for millimeters"px"for pixels
For raster images (e.g. jpg or png), you should also specify the
resolution with dpi=.